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K’AP: AN INTERACTIVE CLUSTER ANALYSIS PROCEDURES FOR ANALYZING 

REMOTELY SENSED DA T A 


I. INTRODUCTION 

The LANDSAT Multispeetral Scanner measures the intensity of radiation reflected by the 
earth’s surface in four spectral bands at a ground resolution of approximately 80 m. Ground ob- 
jects reflect radiation in a characteristic pattern of intensities, according to the object’s physical 
properties. This pattern may be defined in terms of radiance means and a covariance matrix (i.e. 
training statistics) for a particular cover type, These statistics may then be used to train a classifier 
which recognizes patterns in a new environment by classifying the radiance data for each resolution 
element (pixel) into one of the pattern classes (cover types) under consideration, A thematic map 
can be produced to show the spatial distribution of the categories identified, Such maps can pro- 
vide valuable information for use in mapping and monitoring natural resources, 


Training statistics describing various land cover types can be developed using a supervised or 
unsupervised approach, Supervised methods involve the derivation of signature statistics front the 
analysis of picture elements within areas of spectral uniformity, These “training’ areas must be 
located for each land cover category of interest, It may sometimes be difficult or impossible to 
specify a full list of the categories to be identified or to define training areas for all of the important 
features in a scene, especially for small, irregular or sparsely distributed features, llnsupervised 
methods such as cluster analysis can be used to estimate training statistics without the use of 
training areas and to map features in a scene without predetermining their identity, 


The purpose of cluster analysis h to group data with a minimum of a priori knowledge. Since 
it is probable that universal objective clustering criterion exists (Fukunaga and Koontz, 1 ‘>70). 
many different clustering approaches have been defined. Anderberg ( 1 973) gives comprehensive 
coverage of the theoretical background and methodologies of cluster analysis. Hartigan (1.975) 
presents program listings and describes various clustering and related algorithms. Dubes and Jain 
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( 1976) tested and compared eight representative clustering programs and listed guidelines lor pro- 
gram selection by potential users, 

II. CLUSTERING METHODS USED IN REMOTE SENSING 

Procedures used to cluster remotely sensed data can be divided into two groups based upon 
the methods used to control the clustering process, Those used by Turner (1972), Su and Cum- 
mings (1972), Kan et al, (1973), and the ISODATA algorithm as used by Zobrist (1976) require 
that the user manually specify various parameters to control the clustering process, These param- 
eters are varied and the programs run in an iterative fashion until the output set of clusters meets 
the analyst's criteria, 

Other procedures given by Leaoueher and Lowit/ ( 1 976), Borriello and Capo/./a ( 1974), Eigen 
et al, ( 1974), Fromm and Northom," (1976), and Goldberg and Shlien (1978) requite a minimum 
of user input or determine the control parameters automatically from the data itself. This auto- 
matic group of procedures are most effective in producing an initial scene classification since the 
analyst is presumed to be unfamiliar with the scene and cannot intelligently select control param- 
eters. 


Most cluster analysis procedures used to process remotely sensed data invoke an iterative two 
step process. The first step deals with centroid location and cluster formation or growth. The 
information relevant to this Initial step is quantitative since all of the entities to be manipulated 
are expressed numerically, A set of numerical rules are defined to regulate the formation of new 
centroids and to determine those data points which will be assigned to a given centroid, For ex- 
ample, the creation of new centroids can be controlled by defining a threshold distance from all 
existing centroids that a candidate point must exceed before becoming a new centroid, The mini- 
mum euclidian distance criterion can be used to determine the point membership of each centroid, 
A data point is assigned to the cluster whose centroid is nearest to that point in P space, where P 
is the dimensionality of the data set. 



The second logical step within ;tn iteration is the evaluation ot the clusters produced In 
the first step, Once formed, clusters must he evaluated to determine if the present configuration 
is optimal or whether modifications are necessary, Most procedures define a fixed set of criteria 
hy which clusters are evaluated and subsequently modified. I'm example, the ISODATA algorithm 
(Ball and Hall. I%5‘ is designed to split any duster whose standard deviation exceeds a split 
threshold, delete any cluster with less than a specified number of members, and lump together 
cluster pairs whose centroids are less than a specified distance apart. The various thresholds ale 
determined by the analyst. 

A disadvantage of these indirect evaluation methods (indirect in the sense that the analyst 
manipulates parameters rather than the clusters) is that no one set of rules can be defined to cover 
all of the possible analytical objectives of the data analysis. In addition, the analyst cannot effec- 
tively extrapolate prior information about the category structure into the selection of control 
parameters. Consider a situation in w‘..ich the objective is to map different types of forested area -, 
such as hardwood or conifers, within a scene. Ideally, the analyst could encourage the develop- 
ment of forest signatures by focusing attention on clusters whose centroids resemble typical forest 
responses and suppress clusters which appear to belong to irrelevant categories. Such a selective 
clustering process cannot be performed by existing procedures since the clusters are collectively 
evaluated according to fixed criteria. 

III. The ICAP Algorithm 

An Interactive Cluster Analysis Procedure (ICAP) was developed to avoid the inflexibility 
imposed by fixed cluster evaluation criteria, via a direct evaluation process in which each cluster 
is appraised and modified independently of the other clusters. ICAP combines the rapid numerical 
processing capacity of the computer with the human ability to integrate qualitative information to 
form a supervised clustering procedure. Control of the clustering process alternates between ICAP 
which examines data, locates new centroids and forms clusters; and the analyst who can request 
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a cluster summary table and determine ami execute the modifications, if any. to be made to ||»e 
dustei continuation. 

I fits shared control approach has two major advantages: ICAP does not have to optimize the 
duster configuration. thus simplify ing site program ami reducing its execution tune: effeeuve use 
ts marie of sublet live lodgement stttee the analyst’s judgement becomes an integral part of the 
clustering piocess. thus qualitative information can he user! as a natural part of the analysis, 

I he methodology used m ICAP combines the concept of a cluster acceptance region (Muccair- 
di and hose, l‘>7 2t with cluster manipulation techniques adopted from tile ISODAI A algorithm 
(Ball and Hall. I%5). and incorporates them into an interactive scheme. ICAP can he logically 
divided into three stages: 

I, Data Preprocessing The d'ta are examined ami the overall distance threshold (ODT) 
is computed, I hc ODT is used to control the resolution (number and relative size) of the 
clusters to he produced m Supervised Clustering (SCI! :$>, If initial centroids are not speci- 
fied, the mean of the scanned data is used as a starting centroid. 

Supervised Clustering (SCI US) Control of the clustering process alternates between 
ICAP, which scans the data, locates new centroids and forms clusters, and the analyst, who 
can evaluate and elect to modify the cluster structure. Thus, the analyst interacts with ICAP 
and controls the frequency of this interaction *‘'y specifying the maximum number of data 
points to he processed at once. The capability of modifying the cluster struct tire after pro- 
cessing arbitrarily sized segments of the data enables the analyst to closely supervise the 
clustering process. Clusters can be deleted, lumped together pairwise, or new centroids can 
be added. A summary of the cluster statistics can be requested to facilitate cluster manipu- 
lation. 

d. Data Classification (l)CLASS) The data are classified using centroids which remain 
fixed for a complete pass through the data. After each pass, new centroids are computed 
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to he the mean of their respective duster*. In addition to the modifications listed in SCLUS. 
the analyst can elect to split clusters. 

A data set need only be preprocessed once. Stages 2 and 3 can be used to iteratively per- 
form a global-local analysis smilar to the approach proposed by Northouse et al. (I ‘>73). The 
methods of approach used in the three stages are described below. 


Data Preprocessing 

This stage locates the initial data centroid(s) and computes un overall distance threshold 
(ODT). The data are scanned and the sample mean, standard deviation, and maximum and mini- 
mum responses are computed for each of the P dimensions of the data. Upper and lower bounds 
are located on each dimension of the data to include the main concentration of data and to ex- 
clude outliers. These bounds are given by the dimension mean plus or minus 2.5 standard devi- 
ations. This interval should include approximately ‘>‘> percent of the data assuming they are nor- 
mally distributed data. If either computed bound exceeds the actual range of the data, the appro- 
priate bound is reset to be the actual maximum or minimum response. The volume (Vt of the 
data is found by taking the product of the dimensional ranges. 


ODT is a function of V. the approximate volume of the data space excluding outliers, and R. 
the user defined resolution or desired number of clusters to be examined in SOLUS (equation I ). 

»/p 


ODT 


(*■)' 


(It 


where P is the dimensionality of the data. Conceptually. ODT is the side length of a h>per- 
cubical cell selected such that V can be partitioned into K such cells. ODT is also equal to the 
minimum distance between the centers of neighboring hypersltcres inscribed within the hvpercuhcs. 
It is used in SCLUS to define the radius of a hyperspherieal acceptance region which is centered 
about each centroid. All data points within an acceptance region arc joined to the appropriate 
cluster. Data points outside all acceptance regions become the initial centroids tor new clusters. 
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For uniformly distributed data, this scheme should allow approximately R clusters to he generated 
in SCI.US. It can he expected in practice, that more than R clusters will he produced, since outlier 
points would form additional clusters, and because the OUT Is individually weighted Vor each 
cluster. 

This procedure does not attempt to optimize the computation of the OFT beyond identifying 
reasonable ranges in each dimension, nor does it attempt to detect clusters *.*■ i>i. n viohte the assump- 
tions made about the cell structure. The initial eentroid(s) can be supplied by the analyst or the 
mean of .lie scanned points may be used. Figure I illustrates the above compulations in a simple 
two dimensional case. 

Supervised Classification (S(’Ll'S) 

SCI. US requires an overall distance threshold (ODT) and at least one initial centroid. These 
parameters can be supplied by the analyst if known a priori or can be determined by preprocessing 
the data. Il>perspherical acceptance regions are centered about the cluster centroids with radii 
equal to ODT times the local cluster density (described below! for each cluster. I ach data point 
within a segment is examined in turn. If *he point falls within the acceptance region of a centroid, 
it js grouped vs ill) that centroid. Otherwise, the point becomes a new centroid and immediately 
begins to accumulate its own points. This method of centroid determination tends to promote 
a t .mix uniform distribution of centroids over the data space. 

( luster proliferation is encouraged in areas of relative low cluster density and inhibited in 
areas o! high cluster density by weighting the ODT by the local cluster density. This selectively 
changes the acceptance region size. The local cluster density for the ith cluster is equal to the 
average distance between the ith centroid ami all other centroids, divided by the average distance 
between all centroid pairs. This radio is greater than unity for regions with high cluster density and 
less than unity for low density regions. 
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After each data segment is processed, a listing can be requested to summarize the current 
cluster configuration. Statistics (see Table II) including the centroid locations, number of member 
points, index of the nearest and farthest centroid, distance to the nearest centroid, and the average 
distance to other centroids are given to help the analyst determine which modifications Sf any, are 
necessary. Based upon this evaluation, the analyst can elect to lump clusters together by pairs, 
delete clusters, add new centroids, or leave the configuration as is. Any modification of the duster 
structure within an iteration makes it impossible to compute the cluster standard deviation. Since 
the standard deviation is used as a criterion for cluster splitting the option to split clusters is de- 
ferred to the DCLASS stage. The analyst may perform any combination of the above modifications 
as long as sufficient clusters remain to he manipulated. Additional summaries can be requested to 
aid this process. Upon completion of the modifications, control is returned to ICAP which then 
continues to process additional segments and alternate control with the analyst until all of the 
scene has been examined. 

Data Classification (DCLASS) 

DCLASS requires an input set of centroids and does not allow any change in the number of 
position of the centroids during one complete pass through the data. Cluster memberships are 
determined by the minimum euclidian distance rule, subject to the constraint that a point must 
be no further than DNC from its nearest centroid to be joined to that centroid’s cluster. DNC 
is the distance from the centroid under consideration to its nearest neighboring centroid. This 
constraint prevents outlier data from being joined to inappropriate clusters. After each pass new 
centroids are computed to be the mean of their respective clusters. DCLASS can be run in an 
iterative fashion until the process converges; that is until there is no significant point reallocation 
among clusters between subsequent passes. 

The standard deviation, ADG, and ADL are computed for each dimension of all clusters. 
ADG is the distance from the centroid to the mean of all points in the cluster greater than the cen- 
troid. ADL is the corresponding distance from all points less than the centroid. A cluster summary 
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identical to that described in SCLUS and the cluster standard deviations are listed. The analyst 
can direct that certain clusters be split, based on the information provided. ICAP splits a cluster 
by first defining two new centroids which are identical to the original except in the dimension 
to be split. The values for this dimension are determined by adding the ADC and subtracting 
the ADL from the original centroid value. In addition to cluster splitting, the modifications de- 
tailed for SCLUS cun also be performed. 

Selection of R and SCLUS Segment Sizes 

A goal of the analysis is the recognition and location of natural groups within the data. De- 
pending upon the resolution factor R used in ICAP, a given natural group may be represented by 
several clusters, by one cluster, or it may share a cluster with other natural groups. In the second 
case, no corrective action is necessary. The error in the first case can be corrected by lumping 
clusters together, and the error in the third case can be corrected by splitting clusters. 

A logical method of lumping clusters would be to join the pair with nearest centroids as 
determined from examination of the pairwise distances between all centroids. The number of 
computations required for this correction is a function of the number of clusters. Candidates for 
splits can be identified by reviewing the standard deviation for each dimension of all clusters. 
The number of computations is a function of the number of data points. Since the number of 
clusters is usually much less than the number of dat < points, the splitting operation uses more 
computer resources than the lumping operation. The need for splitting clusters can be largely 
eliminated in SCLAS by sleeting R to be somewhat larger than the expected number of clusters. 
An R of 1 .5 - 2.0 times the desired number of clusters was used in the ICAP tests reported in 
this paper. 

The analyst controls the frequency of interaction within SCLUS by specifying that the image 
be processed by segments. The capability of examining and modifying the cluster structure at 
varying intervals within one pass of the data allows the analyst to moniter the formation of new 
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centroids and subsequent cluster grow;:.. The principal advantage of this approach is that unwanted 
clusters can be promptly eliminated. This improves the efficiency of the clustering process since the 
number of centroids to be examined in reduced. 

The maximum rate of centroid proliferation can be expected during the initial stages of data 
proces»ing. This rate should diminish as the number of existing centroids increases. To prevent the 
formation of two many centroids at once, the initial segments should be relatively small compared 
to the size of the data set (ie. the smaller of 500 points, or S percent of the data set size). The seg- 
ment size should then Ire gradually increased during the tatter stages of processing. Although the 
segment size selection is an arbitrary process, a rule of thumb can be given. Experience from 
testing ICAP has shown that 3 - 10 new centroids is a “comfortable" number to consider after 
segment processing. Let LSHG be the number of points process'd in the last segment, and NCEN 
be the number of new centroids cr> „t(>d, if NCEN is less than 3, the next segment size should be 
twice LSEG. If NCEN is greater than 10, the next segment size should be half LSEC. 

IV. IMPLEMENTATION AND TESTING OF ICAP 

The ICAP algorithm is designed to function in an interactive mode in which the analyst di- 
rectly interacts with the computer, supplying input at the request of the program and receiving out- 
put as it is computed. The procedure is coded in APL (A Programming Language), which supports 
this interaction. APL, originally developed by Iverson ( l%2), is a concise and powerful language in 
which operations on single items (scalars) extend naturally to matrices of any size and shape. A 
large number of operators enable single APL instructions to perform operations requiring many 
statements in other languages. Single instructions can be combined into expressions that can be 
grouped into APL programs. This, lengthy procedures in other languages can often be succinctly 
expressed in APL with much fewer lines of cotie. The use of APL is described by Gilman and Rose 
(1976). ICAP was implemented on an IBM 370/3033 computet at the Pennsylvania State Univers- 
ity University Park, Pa. Various programs from a software system developed by the Office for 
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for the Remote Sensing of Eurth Resources (ORSER) at the Pennsylvania State University (Turner 
et al, 1978) were used to evaluate ICAP’s performance. 

Two different Landsat scene 1 - were used to test ICAP’s clustering abilities. The first, in which 
the analyst was assumed to have/to prior knowledge of the data, required an initial categorization 
type of analysis in which the clusters were formed more or less automatically with a minimum 
of user input. The second, in which the analyst was assumed to have partial knowledge of the im- 
portant groups in the data employed a selective clustering type of analysis. Using this approach, 
the analyst focused attention and enhanced the development of clusters of interest and inhibited 
the development of clusters of little interest. The testing of the selective clustering appioach is 
described in detail since it better illustrates the interactive use of ICAP. 

A. Selective Clustering 

The data used in this test are from an unpublished study by Turner (1978) which described the 
mapping of gypsy moth forest defoliation damage in central Pennsylvania using two merged scenes 
of Landsat imagery. The July 19, 1976 Landsat scene (data dimensions S to 8) had no defoliation. 
Tire June 1 9, 1977 scene (data dimensions 1 to 4) showed defoliation. The two scenes were geomet- 
rically corrected and registered to one another using the VICAR image processing program package 
at the NASA (ioddaru Space Plight Center, Greenbelt, Md, The test site included a mountain covered 
by hardwood forest, surrounded by agricultural lands. Since the goal of this analysis was to map can- 
opy defoliation, the non-forest areas were not considered when developing training statistics or as- 
sessing classification accuracy It was known beforehand that hardwood forest vegetation at the test 
site had typical response of about 16, 14, 52, and 35 in Landsat bands, 4, 5, 6 and 7 respectively, on 
both dates. 

The reference signatures for the accuracy comparison were developed using a supervised 
analysis. Training statistics were derived from training areas covering iieaithy, moderately and 
severely defoliated forest. These training areas were located through the use of the ORSER Uni- 
formity Mapping Program UMAP, (Turner, et al. 1978) in conjunction with U-2 color aerial 
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photography. Althougli no quantitative accuracy assessment was performed, the thematic map 
produced by classifying the scene with the reference signatures using the ORSER minimum eucli- 
dian distance classifier CLASS, (Turner, et al. 1978) appeared to correspond to the U-2 photography. 
A description of the analysis performed with the ICAP and CLUS programs is give; below. 

ICAP Analysis 

The data were first preprocessed to determine the overall distance threshold and to locate 
an initial centroid (Table I). It was believed that 4 to 6 categories were sufficient to map sun- 
classes within the forest canopy category. A larger resolution factor of 10 was selected to reduce 
the potential for cluster splitting. 

The SCLUS stage was used to locate an initial data partition. A cluster summary was requested 
after each segment was processed to determine what modifications might be necessary. Eight 
centroids were grown during the processing of the first segment which contained 500 poihts. The 
cluster summary is listed in Table II. 

The forest clusters, recognized on the basis of a priori information, were always left unchanged. 

At this point, the major task of the analyst was to limit the number of non-forest clusters. This 
was done by lumping together similiar non-forest cluster pairs. For example, clusters 6-9 in Table II 
seemed to be forest clusters and were not altered. This similar non-forest clusters, pairs (1,5) and 
(2, 3) were lumped together. Nine clusters remained after the last segment was processed. Seven 
of these belonged to the forest category. The other two clusters appeared to typify the non-forest 
categories response (believed to be agricultural lands) and were retained in the analysis. This 
was done to limit the proliferation of spurious non-forest clusters since non-forest responses 
would more likely be grouped with either or these two categories rather than cause new centroids 
to be created. 

An additional pass through the data was made using DCLASS to refine the centroids produced 
in SCLUS (Table 111). Clusters 6-9 appeared to be non-forest and the pairs (6, 8) and (7, 9) were 
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lumped together. The three forest clusters, 1 , 2, and 4, with the highest standard deviation were 
split in dimensions 7, 3, and 7, respectively, to form additional forest clusters. Another pass using 
DCLASS was made to refine the new centroids. The change in point allocation among clusters 
was judged to be minor and the ICAP clustering was terminated. The 1CAP analysis took about 
40 minutes of user time to complete and used 103 seconds of CPU time. 

CLUS Analysis 

The scene was also clustered with the ORSER CLUS program, using the default parameters 
described in the program documentation (Turner et al. 1978). It was necessary to run the program 
three times, adjusting the control parameters according to suggested guidelines in the documen- 
tation, until a satisfactory classification map was obtained. The CLUS analysis took about 10 
minutes of user time to complete and used 66 seconds of CPU time. 

Comparison of Results 

The ORSER program CLASS was used to produce character classification maps for the refer- 
ence, (CAP, and CLUS signatures. The performance of ICAP and CLUS was assessed by noting the 
number of pixels classified as being in agreement with the reference map. The ORSER program 
M APCOMP (Turner, et al. 1 978) was used to automate this comparison. The MAPCOMP program 
compares two character maps element by element and produces a comparison map and accom- 
panying summary tables. Any differences in the number of categories betv een the test and refer- 
ence maps were resolved by adjusting the symbols used to indicate a particular category. The severe 
and moderate defoliation categories were assigned unique mapping symbols. Other areas w^.e 
ignored and mapped as blanks. 

The test results (Tables IV and V) indicated that ICAP more accurately duplicated the refer- 
ence map in locating the defoliation categories (70.7 versus 57.2 percent agreement for CLUS). 
Visual comparison of the test maps revealed that both ICAP and CLUS had difficulty in resolving 
the boundary between the severely and moderately defoliated categories. 
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B. Initial Categorization 

A test procedure similar to the one described above w m used to analyse dat from part of a 
study by Merembeck <1978). He mapped forest cover and small openings in northwestern Pennsyl- 
vania using four channel Landsat data. The reference signatures for the larger homogeneous cover 
types were derived from training areas. Signatures for the smaller sparesely distributed cover types 
had been derived from the application of the ORSER CLUS program to the portions of the scene 
left unclassified by the supervised analysis. Merembeck devised a set of 34 signatures which he 
grouped into 13 categories. No accuracy assessment was performed. The goal of the test was to 
map as many of these categories as possible with ICAP and CLUS, and derive the best initial classifi- 
cation of the scene. The results of the unsupervised classification using ICAP and CLUS were com- 
pared to MerembecEr’s results. 

It was known front visual examination of the Landsat imagery that portions of the scene were 
under considerable cloud cover. These areas were identified by their higher responses, typically 
above 45, 45, 45, and 30 in Landsat bands 4, 5, 6, and 7 respectively. Th' se responses were con- 
sidered to be noise and were ignored in the analysis. The test was made under the assumption 
that nothing was known about the cover type categories, other than a general familiarity with 
cover types in similiar regions of Pennsylvania. 

It was believed that as many as 10 to 15 categories might be represented in the scene and a 
resolution (R) of 20 was selected. Since no specific a priori knowledge was assumed, the modifi- 
cations performed in SCLUS were limited in scope to the reduction of noise (cloud) clustus. 
After an additional pass of the data was made with DCLASS, the ICAP clustering was terminated. 
The ICAP analysis took about 30 minutes of user time to complete, using 237 seconds of CPU 
time, and produced 7 spectral classes. 

The scene was also clustered using the ORSER CLUS program, using the default parameters. 
An examination of the classification map revealed the the five clusters appeared to categorize the 
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data into meaningful patterns and no further processing was done. The CLUS analysis took about 
10 minutes of user time to complete and used 28 seconds of CPU time. 

Comparison of Results 

The ORSER program CLASS was again used to generate three classification maps for each 
set of signatures. The reference map was altered for comparison purposes by mapping similiar 
categories with the same mapping symbol. The ICAP and ClUS programs were compared (using 
MAPCOMP) with versions of the reference map altered to a resolution of seven and five categories, 
respectively. 

The test results (Tables VI and VII) indicated that ICAP produced a higher resolution (seven 
versus five categories) and matched the reference map more accurately than CLUS (81.9 versus 
70.7 percent agreement). Visual examination of the test comparison maps revealed that the major 
difference was that ICAP more accurately located the category boundaries, particularly in the 
Northwest Aspect Pores t and Small Stream categories. 

V. CONCLUSIONS 

The general methodology used in cluster analysis and several of the techniques used in . emote 
sensing applications have been reviewed. The existing algorithms for clustering remotely sensed 
data were considered to have limited flexibility . and cannot perform selective clustering since the 
clusters are evaluated collectively, thus preventing the analyst from effectively utilizing a priori 
knowledge about the data. A new procedure called ICAP was developed which allows the user 
to form clusters automatically or to interactively control the clustering process. Unlike existing 
procedures, this control is implemented by direct manipulations of the clusters themselves. No 
processing parameters are necessary. The flexibility of ICAP was evaluated using data from dif- 
ferent La ml sat scenes that represent two situations: one in which the user lias limited prior knowl- 
edge about the category structure and wishes to have the clusters formed more or less auto- 
matically, and the other in which the user has a fairly complete knowledge about the existing 
categories in the data and wishes to use that information to closely supervise the clustering process. 
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For companion, an existing clustering method CLUS by Turner (1972) was also applied to the same 
data sets. ICAP performed appreciably better than the CLUS program in matching the reference 
classification maps for the two test areas. For these scenes at least, the results indicate that ICAP is 
at least as good or better than the CLUS procedure in terms of accuracy. The results support the 
conclusion that the flexibility of ICAP can be effectively utilized to perfonn cluster analysis, regard- 
less of the amount of a priori knowledge available. 

The ICAP program used more CPU and analyst time than did the CLUS program in processing 
the test areas. It is difficult and perhaps unwise to draw general conclusions about the analyst 
time and CPU time required for the ICAP and CLUS analyses The amount of CPU time used is 
dependent upon either ihe number of CLUS runs or the number of passes made through the da a 
in ICAP. Both of these may vary widely for any given data set since the determination of a satis- 
factory result is largely subjective. However, it would appear that ICAP offers a mom productive 
use of time since the user is always in direct contact with the clustering process. This supports a 
continuous learning process, unlike other procedures which function in a batch mode, in which 
the user must select control parameters and wait for results. 
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LB2 UB2 



BAND 2 



With the elliptical data distribution shown with mean at A, new centroids would be grown 
at B and C. 
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Tabic I. Statistics from preprocessing the data. 


Dimensions 



1 

2 

3 

4 

5 

6 

7 

8 

Mean 

18.1 

17.2 

53.3 

28.0 

17.8 

15.2 

58.9 

“9 

Standard deviation 

2.2 

4.0 

7.5 

5.2 

2.8 

4.1 

4.1 

2.7 

Minimum 

14.0 

12.0 

35.0 

16.0 

15.0 

1 1.0 

35.0 

14.0 

Maximum 

31.0 

36.0 

73.0 

42.0 

34,0 

39.0 

78.0 

42.0 
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Table IV. ICAP confusion table indicating percentage agreement and disagreement between 
categories identifed by ICAP and similar categories using the reference map signatures. 




ICAP Categories 



I\t7lviinvv 

Categories 

Moderate 

Severe 

Other 

Total 

Moderate 

30.0 

4.0 

16.5 

50.5 

Severe 

0.S 

40.7 

8.4 

49.6 

Total 

30.5 

44.7 

24.9 



Total percentage agreement * 70.7 


Table V. CLUS conftision table indicating percentage agreement and disagreement between 
identified by CLUS and similar categories using the reference map signatures. 


Reference 

Categories 


CLUS Categories 



Moderate 

Severe 

Other 

Total 

Moderate 

30.5 

0.0 

20.0 

50.5 

Severe 

7.6 

26.7 

15.2 

4^.5 

Total 

38.1 

26.7 

35.2 


Total percentage agreement » 57,2 
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Total percentage agreement * 81.9 


Table VII, CLUS confusion table indicating percentage agreement and disagreement between 


categories identified by CLUS and similar categories using the reference map signatures. 


CLUS Categories 



Total percentage agreement * 70,7 



